Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this project, I used Python and Keras with TensorFlow backend to classify traffic signs.

Dataset used: German Traffic Sign Dataset. This dataset contains more than 50,000 images of 43 classes (german traffic signs).

I was able to reach a 96.76% validation accuracy, and a 94.49% testing accuracy.

Pipeline:

  • Load The Data.
  • Dataset Summary & Exploration
  • Design and Test Model Architecture
    • Data Preprocessing
      • Image Augmentation
        • Zoom
        • Rotate
        • Skew
      • Normalization
    • Model Training and Evaluation
      • LeNet-5 Architecture
      • MiniVGGNet Architecture
      • Testing the Model Using the Test Set
  • Test the Model on New Images

Step 0: Load The Data

The dataset could be downloaded from here. This dataset contains the following three pickle files:

  • train.p: The training set.
  • test.p: The testing set.
  • valid.p: The validation set.

Each of the 3 pickle files contains a dictionary with 4 key/value pairs:

  • 'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).
  • 'labels' is a 1D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.
  • 'sizes' is a list containing tuples, (width, height) representing the original width and height the image.
  • 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGES

All images in the dataset are alreday resized to the size of 32 x 32 pixles

The data (train.p, test.p, valid.p) must be located in the data folder one level above the notebook!

In [4]:
# Load pickled data
import pickle
import os

training_file = '../data/train.p'
validation_file = '../data/valid.p'
testing_file = '../data/test.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(validation_file, mode='rb') as f:
    valid = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_train, y_train = train['features'], train['labels']
X_valid, y_valid = valid['features'], valid['labels']
X_test, y_test = test['features'], test['labels']

Load traffic sign names from signnames.csv and map the class-IDs to traffic sign names

In [5]:
import csv
import numpy as np

# load class-ids and sign names from csv file
def load_signnames_from_csv(filename):
    rows = []
    with open(filename) as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
        next(reader)  # skip header
        for row in reader:
            class_id = row[0]
            sign_name = row[1]
            rows.append((class_id, sign_name))

    return np.array(rows)


sign_names = load_signnames_from_csv('signnames.csv')
num_classes = len(sign_names)
print('Number of classes: {}'.format(num_classes))
print()

for sign in sign_names:
    print('{:4d}: {}'.format(int(sign[0]), sign[1]))
Number of classes: 43

   0: Speed limit (20km/h)
   1: Speed limit (30km/h)
   2: Speed limit (50km/h)
   3: Speed limit (60km/h)
   4: Speed limit (70km/h)
   5: Speed limit (80km/h)
   6: End of speed limit (80km/h)
   7: Speed limit (100km/h)
   8: Speed limit (120km/h)
   9: No passing
  10: No passing for vehicles over 3.5 metric tons
  11: Right-of-way at the next intersection
  12: Priority road
  13: Yield
  14: Stop
  15: No vehicles
  16: Vehicles over 3.5 metric tons prohibited
  17: No entry
  18: General caution
  19: Dangerous curve to the left
  20: Dangerous curve to the right
  21: Double curve
  22: Bumpy road
  23: Slippery road
  24: Road narrows on the right
  25: Road work
  26: Traffic signals
  27: Pedestrians
  28: Children crossing
  29: Bicycles crossing
  30: Beware of ice/snow
  31: Wild animals crossing
  32: End of all speed and passing limits
  33: Turn right ahead
  34: Turn left ahead
  35: Ahead only
  36: Go straight or right
  37: Go straight or left
  38: Keep right
  39: Keep left
  40: Roundabout mandatory
  41: End of no passing
  42: End of no passing by vehicles over 3.5 metric tons

Step 1: Dataset Summary & Exploration

Provide a Basic Summary of the Data Set Using Python, Numpy and/or Pandas

In [6]:
### Replace each question mark with the appropriate value. 
### Use python, pandas or numpy methods rather than hard coding the results
import numpy as np

# Number of training examples
n_train = len(X_train)

# Number of validation examples
n_validation = len(X_valid)

# Number of testing examples.
n_test = len(X_test)

# What's the shape of an traffic sign image?
image_shape = X_train[0].shape

# How many unique classes/labels there are in the dataset.
n_classes = len(np.unique(y_train))

print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
print()
print("Image Shape: {}".format(X_train[0].shape) )
Number of training examples = 34799
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43

Image Shape: (32, 32, 3)

Include an exploratory visualization of the dataset

Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.

The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.

Display Train Data

First, examine how often each class is represented in the dataset, which is easiest to achieve with a histogram.

In [8]:
import matplotlib.pyplot as plt
%matplotlib inline 

# histogram of class frequency
fig, ax = plt.subplots()
hist, bins = np.histogram(y_train, bins=n_classes)
center = np.array(range(0, n_classes))
median = np.median(hist)
ax.plot(bins, np.full(len(bins), median, dtype=int), '--', color='red')
ax.bar(center, hist, align='center', width=0.8)
ax.set_title("Number of images per class")
ax.set_xlabel('Class')
ax.set_ylabel('Number of images')
plt.text(n_classes+3, median, 'Median: {}'.format(median), color='red')
fig.tight_layout()
plt.show()

print()
print('Median of images per class: {}'.format(median))
Median of images per class: 540.0

If the classes in the record are represented very differently, the classes that are represented more frequently are preferred in the classification. However, this should be avoided, therefore each class in the training data set should be represented by an approximately equal number of images.

The histogram above shows that classes are very differently distributed, which adversely affects predictive accuracy. Therefore I decided to cut the items per class of the training dataset at median number of images per classes.

In [11]:
# cut number of images per class to the median number images per class
def equalize_images_per_class(data, labels, num_classes, threshold):
    images = []
    classes = []
    for class_id in range(0, num_classes):
        group = data[labels == class_id]
        if len(group) > threshold:
            group = group[:threshold]

        for image in group:
            images.append(image)
            classes.append(class_id)
    return np.array(images), np.array(classes)

X_train, y_train = equalize_images_per_class(X_train, y_train, num_classes, int(median))

# histogram of class frequency
fig, ax = plt.subplots()
hist, bins = np.histogram(y_train, bins=n_classes)
center = np.array(range(0, n_classes))
ax.bar(center, hist, align='center', width=0.8)
ax.set_title("Number of images per class")
ax.set_xlabel('Class')
ax.set_ylabel('Number of images')
fig.tight_layout()
plt.show()

The distribution is still not optimal, but much better than before. Unfortunately, some classes are very poorly represented, but it makes no sense to use even fewer images because as a rule of thumb you should have about 1000 images per class to get a good trainig result.

I tested the network with and without class distribution adaptation. Although I achieved slightly better training results without adaptation, the detection of new images was better with adaption.

Next we look at the images of the dataset. The next diagram shows ten randomly selected images of each class.

In [ ]:
import random

def show_dataset(X, y, sign_names, columns=10):
    classes = np.unique(y)

    # show image of 10 random data points
    fig, axs = plt.subplots(len(classes), columns, figsize=(15, 120))
    fig.subplots_adjust(hspace=0.3)
    axs = axs.ravel()
    for row, class_id in enumerate(classes):
        group = X[y == class_id]
        sign_name = sign_names[class_id]
        for col in range(columns):
            image = group[random.randint(0, len(group) - 1)]
            index = row * columns + col
            axs[index].axis('off')
            axs[index].set_title(sign_name)
            if len(image.shape) == 3:
                axs[index].imshow(image)
            else:
                axs[index].imshow(image, cmap='gray')
    plt.show()
            
show_dataset(X_train, y_train, np.unique(y_train).astype(str))

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!

With the LeNet-5 solution from the lecture, you should expect a validation set accuracy of about 0.89. To meet specifications, the validation set accuracy will need to be at least 0.93. It is possible to get an even higher accuracy, but 0.93 is the minimum for a successful project submission.

There are various aspects to consider when thinking about this problem:

  • Neural network architecture (is the network over or underfitting?)
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

Pre-process the Data Set

Minimally, the image data should be normalized so that the data has mean zero and equal variance. For image data, (pixel - 128)/ 128 is a quick way to approximately normalize the data and can be used in this project.

Other pre-processing steps are optional. You can try different techniques to see if it improves performance.

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project.

Grayscale Images

Many classification problems are not dependent on the colors of the images. In these cases, an acceleration of the training process can usually be achieved by converting the images into grayscale images before training.

The following diagram shows the images of the training dataset as grayscale images.

In [ ]:
import cv2

# convert images to grayscale
def to_grayscale(images):
    result = []
    for image in images:
        result.append(cv2.cvtColor(image, cv2.COLOR_BGR2GRAY))
    return np.array(result)


train_gray = to_grayscale(X_train)
show_dataset(train_gray, y_train, np.unique(y_train).astype(str))

Local Histogram Equalization

This technique simply distributes the most common intensity values ​​in an image, improving low-contrast images.

In [28]:
import skimage.morphology as morp
from skimage.filters import rank

# apply local histogram equalization
def local_histogram_equalization(image):
    kernel = morp.disk(30)
    img_local = rank.equalize(image, selem=kernel)
    return img_local

train_equalized = np.array(list(map(local_histogram_equalization, train_gray)))
show_dataset(train_equalized, y_train, np.unique(y_train).astype(str))

I have tested my network with and without grayscale images respectively histogram Equalization. With grayscale images and histogram equalization, I could easily achieve better training results, but in the recognition of real images from the Internet I have achieved better results when I use color images. So I decided to use color images for this project.

Image Augmentation

Data augmentation is a great technique for artificially propagating images of a dataset by duplicating existing images through random manipulations such as scaling, rotation, tilt, noise, ect.

This can be done either by hand by building an augmentation pipeline, analogous to a preprocessing pipeline, which makes the appropriate manipulations. For example, OpenVC offers numerous functions for image manipulation.

But I prefer to use ready-made libraries like imgaug oraugmentor. Here the augmentation pipeline will be described declaratively, which is very clear. For this project I use augmentor, see https://augmentor.readthedocs.io/en/master/.

Zoom

I use random zoom between factor 0.8 and 1.2 to simulate different distances from the camera to the signs.

In [18]:
import Augmentor

p = Augmentor.Pipeline()
p.zoom(probability=0.8, min_factor=0.8, max_factor=1.2)

for class_id in range(0, 10):
    y = np.nonzero(y_train == class_id)[0]
    x = X_train[y_train == class_id]

    datagen = p.keras_generator_from_array([x[0]], [class_id], batch_size=10)
    images, labels = next(datagen)
    show_dataset(images, labels, np.full(len(labels), class_id, dtype=int))

Rotate

I use a random of +/- 15 degree rotation to simulate signs that appear slightly rotated.

In [32]:
p = Augmentor.Pipeline()
p.rotate(probability=0.8, max_left_rotation=15, max_right_rotation=15)

for class_id in range(0, 10):
    y = np.nonzero(y_train == class_id)[0]
    x = X_train[y_train == class_id]

    datagen = p.keras_generator_from_array([x[0]], [class_id], batch_size=10)
    images, labels = next(datagen)
    show_dataset(images, labels, np.full(len(labels), class_id, dtype=int))

Skew

I use a random horizontal respectively vertical tilt to simulate different camera perspectives on traffic signs.

In [35]:
p = Augmentor.Pipeline()
p.skew(probability=0.8, magnitude=0.2)

for class_id in range(0, 10):
    y = np.nonzero(y_train == class_id)[0]
    x = X_train[y_train == class_id]

    datagen = p.keras_generator_from_array([x[0]], [class_id], batch_size=10)
    images, labels = next(datagen)
    show_dataset(images, labels, np.full(len(labels), class_id, dtype=int))

Final Augmentation Pipeline

The final augmentation pipeline for this project combines the three augmentation methods described above.

In [19]:
p = Augmentor.Pipeline()
p.zoom(probability=0.8, min_factor=0.8, max_factor=1.2)
p.rotate(probability=0.8, max_left_rotation=15, max_right_rotation=15)
p.skew(probability=0.8, magnitude=0.2)

for class_id in range(0, 10):
    y = np.nonzero(y_train == class_id)[0]
    x = X_train[y_train == class_id]

    datagen = p.keras_generator_from_array([x[0]], [class_id], batch_size=10)
    images, labels = next(datagen)
    show_dataset(images, labels, np.full(len(labels), class_id, dtype=int))

Train, Validate and Test the Model

Goal of the project is to design and train a model that achieves an accuracy of 93% or greater, on the validation set.

The following method allows to change the optimization method very easy. So the network architecture can be easy testet with different optimization methods to choose the best one.

In [20]:
from keras.optimizers import SGD, Adam, RMSprop, Adagrad, Adadelta

def get_optimizer(optimizer_method):
    if optimizer_method == "sdg":
        return SGD(lr=1e-2, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
    if optimizer_method == "rmsprop":
        return RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
    if optimizer_method == "adam":
        return Adam(lr=0.001, decay=0.001 / num_epochs)
        # Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
    if optimizer_method == "adagrad":
        return Adagrad(lr=0.01, epsilon=1e-08, decay=0.0)
    if optimizer_method == "adadelta":
        return Adadelta(lr=1.0, rho=0.95, epsilon=1e-08, decay=0.0)

I use three callback methods:

EarlyStopping:

Stops the training process prematurely if no improvement has been achieved for several consecutive epochs.

ModelCheckpoint:

Saves the best model ever learned after each epoch.

ReduceLROnPlateau:

Automatically reduces the learning rate if no improvement has been achieved over several epochs.

In [21]:
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint, ProgbarLogger

def get_callbacks(model_architecture, optimizer_method):
    model_filepath = './output/traffic_signs_model_{}_{}.h5'.format(model_architecture, optimizer_method)
    callbacks = [
        EarlyStopping(monitor='loss', min_delta=0, patience=5, mode='auto', verbose=1),
        ModelCheckpoint(model_filepath, monitor='val_loss', save_best_only=True, verbose=1),
        ReduceLROnPlateau(monitor='loss', factor=0.1, patience=2, verbose=1, mode='auto', min_delta=1e-4, cooldown=0,
                          min_lr=0)]
    return callbacks

After the training process, I create a picture of the training process and save it.

In [22]:
def plot_train_history(H, model_architecture, optimizer_method):
    plt.style.use("ggplot")
    plt.figure()
    plt.plot(np.arange(0, len(H.history["loss"])), H.history["loss"], label="train_loss")
    plt.plot(np.arange(0, len(H.history["val_loss"])), H.history["val_loss"], label="val_loss")
    plt.plot(np.arange(0, len(H.history["acc"])), H.history["acc"], label="train_acc")
    plt.plot(np.arange(0, len(H.history["val_acc"])), H.history["val_acc"], label="val_acc")
    plt.title("Training Loss and Accuracy")
    plt.xlabel("Epoch #")
    plt.ylabel("Loss/Accuracy")
    plt.legend()
    plt.savefig('./output/training-loss-and-accuracy_{}_{}.png'.format(model_architecture, optimizer_method))
    plt.show()

Here I normalize the images between 0.0 and 1.0. The image data should be normalized so that the data has mean zero and equal variance. I only do this for the validation data, because the Image Augmentor we use for the training data already supplies normalized data.

Also the class labels y_train and y_valid must be converted to one hot labels.

In [17]:
import keras

# normalize data between 0.0 and 1.0
# don't normalize X_train, because this is already done by the augmentation
X_valid = X_valid.astype('float32') / 255

# convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

Here I configure the hyperparameter for the training, these are the batch size, the maximum number of epochs and the optimization method.

In [12]:
# hyperparameter for training
optimizer_method = 'sdg'
batch_size = 128
num_epochs = 100

LeNet Architecture

LeNet-5 is a convolutional network designed for handwritten and machine-printed character recognition. It was introduced by the Yann LeCun in his paper Gradient-Based Learning Applied to Document Recognition in 1998. We can also use the LeNet architecture to classify traffic signs.

LeNet Architecture

All we need is to change the input shape from (32, 32) to (32, 32, 3) because we use color images instead of grayscale images. Also the output size must be chnaged from 10 to 43 classes.

In [25]:
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.layers.core import Dropout

# LeNet model architecture
class LeNet:
    @staticmethod
    def build(num_classes):
        model = Sequential()

        # Layer 1
        # Conv Layer 1 => 28x28x6
        model.add(Conv2D(filters=6, kernel_size=5, strides=1, activation='relu', input_shape=(32, 32, 3)))

        # Layer 2
        # Pooling Layer 1 => 14x14x6
        model.add(MaxPooling2D(pool_size=(2, 2)))

        # Layer 3
        # Conv Layer 2 => 10x10x16
        model.add(Conv2D(filters=16, kernel_size=5, strides=1, activation='relu', input_shape=(14, 14, 6)))

        # Layer 4
        # Pooling Layer 2 => 5x5x16
        model.add(MaxPooling2D(pool_size=2, strides=2))

        # Flatten
        model.add(Flatten())

        # Layer 5
        # Fully connected layer 1 => 120x1
        model.add(Dense(units=120, activation='relu'))

        model.add(Dropout(0.5))

        # Layer 6
        # Fully connected layer 2 => 84x1
        model.add(Dense(units=84, activation='relu'))

        model.add(Dropout(0.5))

        # Output Layer => num_classes x 1
        model.add(Dense(units=num_classes, activation='softmax'))

        # show and return the constructed network architecture
        model.summary()
        return model

Train LeNet Model

The fit_generator() method uses the data generator datagen created by the augmentation pipeline p, this provides newly augmented images for each batch. fit_generator() shuffles the dataset by default. After training the training history will be drawn and saved in a diagram.

In [26]:
model_architecture = 'lenet'

# image augmentation
datagen = p.keras_generator_from_array(X_train, y_train, batch_size=batch_size)

# build LeNet model
lenet_model = LeNet.build(num_classes)

# the function to optimize is the cross entropy between the true label and the output (softmax) of the model
lenet_model.compile(optimizer=get_optimizer(optimizer_method), loss='categorical_crossentropy', metrics=['accuracy'])

# train model
H = lenet_model.fit_generator(datagen,
                              validation_data=(X_valid, y_valid),
                              steps_per_epoch=len(X_train) / batch_size,
                              callbacks=get_callbacks(model_architecture, optimizer_method),
                              epochs=num_epochs,
                              verbose=2)

# plot and save the training loss and accuracy
plot_train_history(H, model_architecture, optimizer_method)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 28, 28, 6)         456       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 10, 10, 16)        2416      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 120)               48120     
_________________________________________________________________
dropout_1 (Dropout)          (None, 120)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 84)                10164     
_________________________________________________________________
dropout_2 (Dropout)          (None, 84)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 43)                3655      
=================================================================
Total params: 64,811
Trainable params: 64,811
Non-trainable params: 0
_________________________________________________________________
Epoch 1/100
 - 15s - loss: 3.6581 - acc: 0.0558 - val_loss: 3.3103 - val_acc: 0.1612

Epoch 00001: val_loss improved from inf to 3.31031, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 2/100
 - 9s - loss: 2.9759 - acc: 0.1681 - val_loss: 2.1524 - val_acc: 0.4129

Epoch 00002: val_loss improved from 3.31031 to 2.15242, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 3/100
 - 10s - loss: 2.2928 - acc: 0.3034 - val_loss: 1.6188 - val_acc: 0.5503

Epoch 00003: val_loss improved from 2.15242 to 1.61877, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 4/100
 - 10s - loss: 1.8820 - acc: 0.4131 - val_loss: 1.2967 - val_acc: 0.5946

Epoch 00004: val_loss improved from 1.61877 to 1.29667, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 5/100
 - 10s - loss: 1.6458 - acc: 0.4716 - val_loss: 1.1479 - val_acc: 0.6358

Epoch 00005: val_loss improved from 1.29667 to 1.14789, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 6/100
 - 10s - loss: 1.5050 - acc: 0.5186 - val_loss: 1.0294 - val_acc: 0.6884

Epoch 00006: val_loss improved from 1.14789 to 1.02943, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 7/100
 - 10s - loss: 1.3838 - acc: 0.5560 - val_loss: 0.9745 - val_acc: 0.7138

Epoch 00007: val_loss improved from 1.02943 to 0.97452, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 8/100
 - 9s - loss: 1.2630 - acc: 0.5996 - val_loss: 0.8590 - val_acc: 0.7381

Epoch 00008: val_loss improved from 0.97452 to 0.85896, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 9/100
 - 9s - loss: 1.1617 - acc: 0.6282 - val_loss: 0.7906 - val_acc: 0.7721

Epoch 00009: val_loss improved from 0.85896 to 0.79060, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 10/100
 - 10s - loss: 1.1096 - acc: 0.6459 - val_loss: 0.7175 - val_acc: 0.7955

Epoch 00010: val_loss improved from 0.79060 to 0.71746, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 11/100
 - 10s - loss: 1.0577 - acc: 0.6625 - val_loss: 0.6783 - val_acc: 0.8048

Epoch 00011: val_loss improved from 0.71746 to 0.67834, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 12/100
 - 10s - loss: 0.9974 - acc: 0.6858 - val_loss: 0.6908 - val_acc: 0.8027

Epoch 00012: val_loss did not improve from 0.67834
Epoch 13/100
 - 10s - loss: 0.9522 - acc: 0.6941 - val_loss: 0.6302 - val_acc: 0.8247

Epoch 00013: val_loss improved from 0.67834 to 0.63015, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 14/100
 - 10s - loss: 0.9332 - acc: 0.7063 - val_loss: 0.6239 - val_acc: 0.8104

Epoch 00014: val_loss improved from 0.63015 to 0.62389, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 15/100
 - 10s - loss: 0.8806 - acc: 0.7245 - val_loss: 0.6124 - val_acc: 0.8245

Epoch 00015: val_loss improved from 0.62389 to 0.61244, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 16/100
 - 10s - loss: 0.8831 - acc: 0.7300 - val_loss: 0.5731 - val_acc: 0.8345

Epoch 00016: val_loss improved from 0.61244 to 0.57308, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 17/100
 - 10s - loss: 0.8438 - acc: 0.7418 - val_loss: 0.5652 - val_acc: 0.8320

Epoch 00017: val_loss improved from 0.57308 to 0.56520, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 18/100
 - 12s - loss: 0.8222 - acc: 0.7428 - val_loss: 0.5019 - val_acc: 0.8463

Epoch 00018: val_loss improved from 0.56520 to 0.50186, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 19/100
 - 10s - loss: 0.8229 - acc: 0.7457 - val_loss: 0.5018 - val_acc: 0.8426

Epoch 00019: val_loss improved from 0.50186 to 0.50175, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 20/100
 - 10s - loss: 0.7806 - acc: 0.7560 - val_loss: 0.4700 - val_acc: 0.8619

Epoch 00020: val_loss improved from 0.50175 to 0.47003, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 21/100
 - 10s - loss: 0.7646 - acc: 0.7661 - val_loss: 0.4585 - val_acc: 0.8649

Epoch 00021: val_loss improved from 0.47003 to 0.45854, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 22/100
 - 10s - loss: 0.7363 - acc: 0.7736 - val_loss: 0.4385 - val_acc: 0.8769

Epoch 00022: val_loss improved from 0.45854 to 0.43850, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 23/100
 - 9s - loss: 0.7251 - acc: 0.7779 - val_loss: 0.4418 - val_acc: 0.8748

Epoch 00023: val_loss did not improve from 0.43850
Epoch 24/100
 - 9s - loss: 0.7143 - acc: 0.7852 - val_loss: 0.4391 - val_acc: 0.8753

Epoch 00024: val_loss did not improve from 0.43850
Epoch 25/100
 - 10s - loss: 0.6963 - acc: 0.7893 - val_loss: 0.4644 - val_acc: 0.8701

Epoch 00025: val_loss did not improve from 0.43850
Epoch 26/100
 - 10s - loss: 0.7046 - acc: 0.7883 - val_loss: 0.4066 - val_acc: 0.8878

Epoch 00026: val_loss improved from 0.43850 to 0.40663, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 27/100
 - 9s - loss: 0.6876 - acc: 0.7889 - val_loss: 0.4155 - val_acc: 0.8821

Epoch 00027: val_loss did not improve from 0.40663
Epoch 28/100
 - 9s - loss: 0.6705 - acc: 0.7998 - val_loss: 0.4336 - val_acc: 0.8757

Epoch 00028: val_loss did not improve from 0.40663
Epoch 29/100
 - 10s - loss: 0.6335 - acc: 0.8074 - val_loss: 0.3948 - val_acc: 0.8918

Epoch 00029: val_loss improved from 0.40663 to 0.39483, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 30/100
 - 9s - loss: 0.6508 - acc: 0.8059 - val_loss: 0.3919 - val_acc: 0.8800

Epoch 00030: val_loss improved from 0.39483 to 0.39189, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 31/100
 - 10s - loss: 0.6522 - acc: 0.8004 - val_loss: 0.4104 - val_acc: 0.8764

Epoch 00031: val_loss did not improve from 0.39189

Epoch 00031: ReduceLROnPlateau reducing learning rate to 0.0009999999776482583.
Epoch 32/100
 - 10s - loss: 0.5746 - acc: 0.8247 - val_loss: 0.3693 - val_acc: 0.8955

Epoch 00032: val_loss improved from 0.39189 to 0.36934, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 33/100
 - 9s - loss: 0.5405 - acc: 0.8370 - val_loss: 0.3349 - val_acc: 0.9034

Epoch 00033: val_loss improved from 0.36934 to 0.33487, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 34/100
 - 10s - loss: 0.5035 - acc: 0.8475 - val_loss: 0.3501 - val_acc: 0.8995

Epoch 00034: val_loss did not improve from 0.33487
Epoch 35/100
 - 10s - loss: 0.4999 - acc: 0.8471 - val_loss: 0.3435 - val_acc: 0.9023

Epoch 00035: val_loss did not improve from 0.33487
Epoch 36/100
 - 9s - loss: 0.4959 - acc: 0.8496 - val_loss: 0.3568 - val_acc: 0.9027

Epoch 00036: val_loss did not improve from 0.33487
Epoch 37/100
 - 10s - loss: 0.5007 - acc: 0.8523 - val_loss: 0.3264 - val_acc: 0.9134

Epoch 00037: val_loss improved from 0.33487 to 0.32641, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 38/100
 - 10s - loss: 0.4959 - acc: 0.8505 - val_loss: 0.3233 - val_acc: 0.9098

Epoch 00038: val_loss improved from 0.32641 to 0.32334, saving model to ./output/traffic_signs_model_lenet_sdg.h5

Epoch 00038: ReduceLROnPlateau reducing learning rate to 9.999999310821295e-05.
Epoch 39/100
 - 10s - loss: 0.4811 - acc: 0.8554 - val_loss: 0.3213 - val_acc: 0.9102

Epoch 00039: val_loss improved from 0.32334 to 0.32133, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 40/100
 - 9s - loss: 0.4841 - acc: 0.8545 - val_loss: 0.3191 - val_acc: 0.9100

Epoch 00040: val_loss improved from 0.32133 to 0.31911, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 41/100
 - 9s - loss: 0.4697 - acc: 0.8565 - val_loss: 0.3175 - val_acc: 0.9118

Epoch 00041: val_loss improved from 0.31911 to 0.31749, saving model to ./output/traffic_signs_model_lenet_sdg.h5
Epoch 42/100
 - 9s - loss: 0.4720 - acc: 0.8563 - val_loss: 0.3186 - val_acc: 0.9118

Epoch 00042: val_loss did not improve from 0.31749
Epoch 43/100
 - 9s - loss: 0.4848 - acc: 0.8549 - val_loss: 0.3165 - val_acc: 0.9136

Epoch 00043: val_loss improved from 0.31749 to 0.31653, saving model to ./output/traffic_signs_model_lenet_sdg.h5

Epoch 00043: ReduceLROnPlateau reducing learning rate to 9.999999019782991e-06.
Epoch 44/100
 - 10s - loss: 0.4815 - acc: 0.8559 - val_loss: 0.3168 - val_acc: 0.9127

Epoch 00044: val_loss did not improve from 0.31653
Epoch 45/100
 - 10s - loss: 0.4779 - acc: 0.8559 - val_loss: 0.3177 - val_acc: 0.9122

Epoch 00045: val_loss did not improve from 0.31653

Epoch 00045: ReduceLROnPlateau reducing learning rate to 9.99999883788405e-07.
Epoch 46/100
 - 9s - loss: 0.4807 - acc: 0.8556 - val_loss: 0.3177 - val_acc: 0.9120

Epoch 00046: val_loss did not improve from 0.31653
Epoch 00046: early stopping

Evaluate LeNet Model

Here I evaluate the trained LeNet model against the test data X_testand y_test. The test data are also data from the german traffic sing dataset, but the network has never seen it before.

The evaluation calculates the loss and accuracy of the trained model. In addition, some test records are listed with their ground truth labels and predicted labels.

In [27]:
from keras.models import load_model

model_architecture = 'lenet'

with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_test, y_test = test['features'], test['labels']

# convert class vector to binary class matrix.
y_test = keras.utils.to_categorical(y_test, num_classes)

# normalize data between 0.0 and 1.0
X_test = X_test.astype('float32') / 255

# load trained model
lenet_model = load_model('./output/traffic_signs_model_{}_{}.h5'.format(model_architecture, optimizer_method))
print()
# print loss and accuracy of the trained model
loss, acc = lenet_model.evaluate(X_test, y_test, batch_size=batch_size, verbose=2)
print('Loss:     {:.2f}%'.format(loss * 100))
print('Accuracy: {:.2f}%'.format(acc * 100))
print()

# show the true and the predicted classes for a couple of items of the test dataset
y_pred = lenet_model.predict(X_test)

start = 110
count = 20
for i, (y_t, y_p) in enumerate(zip(y_test[start:start + count], y_pred[start:start + count])):
    print("{:4d} : True={: <2}  Predicted={: <2}  {}"
          .format(i + start, y_t.argmax(axis=-1), y_p.argmax(axis=-1),
                  y_t.argmax(axis=-1) == y_p.argmax(axis=-1)))
Loss:     37.68%
Accuracy: 89.68%

 110 : True=1   Predicted=1   True
 111 : True=14  Predicted=14  True
 112 : True=16  Predicted=16  True
 113 : True=10  Predicted=10  True
 114 : True=30  Predicted=20  False
 115 : True=3   Predicted=3   True
 116 : True=27  Predicted=27  True
 117 : True=29  Predicted=29  True
 118 : True=1   Predicted=1   True
 119 : True=17  Predicted=17  True
 120 : True=13  Predicted=13  True
 121 : True=7   Predicted=7   True
 122 : True=1   Predicted=2   False
 123 : True=8   Predicted=8   True
 124 : True=2   Predicted=2   True
 125 : True=10  Predicted=10  True
 126 : True=10  Predicted=10  True
 127 : True=30  Predicted=20  False
 128 : True=1   Predicted=1   True
 129 : True=6   Predicted=6   True

The result is not bad, but the goal of a minimum accuracy of 93% is not reached!

MiniVGGNet Architecture

The VGG network architecture was introduced by Simonyan and Zisserman in their 2014 paper, Very Deep Convolutional Networks for Large Scale Image Recognition. It is one of the highest performing Convolutional Neural Networks on the ImageNet challenge over the past few years.

VGGNet Architecture

This network is characterized by its simplicity, using only 3×3 convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by max pooling. Two fully-connected layers, each with 4,096 nodes are then followed by a softmax classifier.

To improve the ... I tried the MiniVGGNet architecture described in chapter 15 of the book Deep Learning for Computer Vision from Adrian Rosebrock

MiniVGGNet Architecture

To use the MiniVGGNet architecture described above for traffic sign classification the size of the last fully connected layer must be changed from 10 to 43 classes.

In [15]:
from keras.layers import Activation, BatchNormalization

class MiniVGGNet:
    @staticmethod
    def build(num_classes):
        model = Sequential()

        chanDim = -1

        # first CONV => RELU => CONV => RELU => POOL layer set
        model.add(Conv2D(32, (3, 3), padding="same", input_shape=(32, 32, 3)))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(Conv2D(32, (3, 3), padding="same"))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        # second CONV => RELU => CONV => RELU => POOL layer set
        model.add(Conv2D(64, (3, 3), padding="same"))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(Conv2D(64, (3, 3), padding="same"))
        model.add(Activation("relu"))
        model.add(BatchNormalization(axis=chanDim))
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        # first (and only) set of FC => RELU layers
        model.add(Flatten())
        model.add(Dense(512))
        model.add(Activation("relu"))
        model.add(BatchNormalization())
        model.add(Dropout(0.5))

        # softmax classifier
        model.add(Dense(num_classes))
        model.add(Activation("softmax"))

        # show and return the constructed network architecture
        model.summary()
        return model

Train MiniVGGNet Model

The trainig process is the same as decribed above, only the model construction changed from LeNet to MiniVGGNet model.

In [29]:
model_architecture = 'vggnet'

# image augmentation
datagen = p.keras_generator_from_array(X_train, y_train, batch_size=batch_size)

# build MiniVGGNet model
vggnet_model = MiniVGGNet.build(num_classes)

# the function to optimize is the cross entropy between the true label and the output (softmax) of the model
vggnet_model.compile(optimizer=get_optimizer(optimizer_method), loss='categorical_crossentropy', metrics=['accuracy'])

# train model
H = vggnet_model.fit_generator(datagen,
                               validation_data=(X_valid, y_valid),
                               steps_per_epoch=len(X_train) / batch_size,
                               callbacks=get_callbacks(model_architecture, optimizer_method),
                               epochs=num_epochs,
                               verbose=2)

# plot and save the training loss and accuracy
plot_train_history(H, model_architecture, optimizer_method)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 32)        128       
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
activation_2 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 32)        128       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 16, 16, 64)        18496     
_________________________________________________________________
activation_3 (Activation)    (None, 16, 16, 64)        0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 16, 16, 64)        256       
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
activation_4 (Activation)    (None, 16, 16, 64)        0         
_________________________________________________________________
batch_normalization_4 (Batch (None, 16, 16, 64)        256       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 8, 8, 64)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               2097664   
_________________________________________________________________
activation_5 (Activation)    (None, 512)               0         
_________________________________________________________________
batch_normalization_5 (Batch (None, 512)               2048      
_________________________________________________________________
dropout_5 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 43)                22059     
_________________________________________________________________
activation_6 (Activation)    (None, 43)                0         
=================================================================
Total params: 2,188,107
Trainable params: 2,186,699
Non-trainable params: 1,408
_________________________________________________________________
Epoch 1/100
 - 12s - loss: 2.3270 - acc: 0.4086 - val_loss: 1.4021 - val_acc: 0.6154

Epoch 00001: val_loss improved from inf to 1.40206, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 2/100
 - 11s - loss: 0.6436 - acc: 0.7934 - val_loss: 0.8639 - val_acc: 0.7397

Epoch 00002: val_loss improved from 1.40206 to 0.86387, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 3/100
 - 11s - loss: 0.3335 - acc: 0.8941 - val_loss: 0.6396 - val_acc: 0.7834

Epoch 00003: val_loss improved from 0.86387 to 0.63963, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 4/100
 - 11s - loss: 0.2353 - acc: 0.9264 - val_loss: 0.4227 - val_acc: 0.8841

Epoch 00004: val_loss improved from 0.63963 to 0.42265, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 5/100
 - 12s - loss: 0.1598 - acc: 0.9488 - val_loss: 0.3248 - val_acc: 0.9079

Epoch 00005: val_loss improved from 0.42265 to 0.32476, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 6/100
 - 11s - loss: 0.1298 - acc: 0.9590 - val_loss: 0.2771 - val_acc: 0.9136

Epoch 00006: val_loss improved from 0.32476 to 0.27707, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 7/100
 - 12s - loss: 0.1109 - acc: 0.9652 - val_loss: 0.1925 - val_acc: 0.9395

Epoch 00007: val_loss improved from 0.27707 to 0.19249, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 8/100
 - 11s - loss: 0.0981 - acc: 0.9693 - val_loss: 0.2561 - val_acc: 0.9259

Epoch 00008: val_loss did not improve from 0.19249
Epoch 9/100
 - 11s - loss: 0.0774 - acc: 0.9753 - val_loss: 0.2513 - val_acc: 0.9327

Epoch 00009: val_loss did not improve from 0.19249
Epoch 10/100
 - 11s - loss: 0.0709 - acc: 0.9776 - val_loss: 0.2523 - val_acc: 0.9399

Epoch 00010: val_loss did not improve from 0.19249
Epoch 11/100
 - 11s - loss: 0.0575 - acc: 0.9820 - val_loss: 0.1877 - val_acc: 0.9587

Epoch 00011: val_loss improved from 0.19249 to 0.18772, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 12/100
 - 11s - loss: 0.0536 - acc: 0.9832 - val_loss: 0.2129 - val_acc: 0.9494

Epoch 00012: val_loss did not improve from 0.18772
Epoch 13/100
 - 11s - loss: 0.0484 - acc: 0.9849 - val_loss: 0.2005 - val_acc: 0.9456

Epoch 00013: val_loss did not improve from 0.18772
Epoch 14/100
 - 11s - loss: 0.0436 - acc: 0.9871 - val_loss: 0.1917 - val_acc: 0.9594

Epoch 00014: val_loss did not improve from 0.18772
Epoch 15/100
 - 11s - loss: 0.0436 - acc: 0.9860 - val_loss: 0.2420 - val_acc: 0.9492

Epoch 00015: val_loss did not improve from 0.18772
Epoch 16/100
 - 11s - loss: 0.0439 - acc: 0.9860 - val_loss: 0.1307 - val_acc: 0.9642

Epoch 00016: val_loss improved from 0.18772 to 0.13070, saving model to ./output/traffic_signs_model_vggnet_sdg.h5

Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.0009999999776482583.
Epoch 17/100
 - 11s - loss: 0.0368 - acc: 0.9889 - val_loss: 0.1320 - val_acc: 0.9639

Epoch 00017: val_loss did not improve from 0.13070
Epoch 18/100
 - 11s - loss: 0.0292 - acc: 0.9913 - val_loss: 0.1264 - val_acc: 0.9653

Epoch 00018: val_loss improved from 0.13070 to 0.12638, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 19/100
 - 11s - loss: 0.0285 - acc: 0.9917 - val_loss: 0.1181 - val_acc: 0.9680

Epoch 00019: val_loss improved from 0.12638 to 0.11813, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 20/100
 - 11s - loss: 0.0299 - acc: 0.9912 - val_loss: 0.1412 - val_acc: 0.9624

Epoch 00020: val_loss did not improve from 0.11813
Epoch 21/100
 - 11s - loss: 0.0268 - acc: 0.9917 - val_loss: 0.1150 - val_acc: 0.9683

Epoch 00021: val_loss improved from 0.11813 to 0.11497, saving model to ./output/traffic_signs_model_vggnet_sdg.h5
Epoch 22/100
 - 11s - loss: 0.0267 - acc: 0.9920 - val_loss: 0.1186 - val_acc: 0.9676

Epoch 00022: val_loss did not improve from 0.11497
Epoch 23/100
 - 11s - loss: 0.0282 - acc: 0.9919 - val_loss: 0.1239 - val_acc: 0.9683

Epoch 00023: val_loss did not improve from 0.11497

Epoch 00023: ReduceLROnPlateau reducing learning rate to 9.999999310821295e-05.
Epoch 24/100
 - 11s - loss: 0.0244 - acc: 0.9928 - val_loss: 0.1227 - val_acc: 0.9676

Epoch 00024: val_loss did not improve from 0.11497
Epoch 25/100
 - 11s - loss: 0.0238 - acc: 0.9927 - val_loss: 0.1254 - val_acc: 0.9667

Epoch 00025: val_loss did not improve from 0.11497
Epoch 26/100
 - 11s - loss: 0.0227 - acc: 0.9933 - val_loss: 0.1256 - val_acc: 0.9669

Epoch 00026: val_loss did not improve from 0.11497
Epoch 27/100
 - 11s - loss: 0.0257 - acc: 0.9925 - val_loss: 0.1256 - val_acc: 0.9664

Epoch 00027: val_loss did not improve from 0.11497
Epoch 28/100
 - 11s - loss: 0.0247 - acc: 0.9930 - val_loss: 0.1257 - val_acc: 0.9662

Epoch 00028: val_loss did not improve from 0.11497

Epoch 00028: ReduceLROnPlateau reducing learning rate to 9.999999019782991e-06.
Epoch 29/100
 - 11s - loss: 0.0220 - acc: 0.9935 - val_loss: 0.1265 - val_acc: 0.9669

Epoch 00029: val_loss did not improve from 0.11497
Epoch 30/100
 - 11s - loss: 0.0244 - acc: 0.9929 - val_loss: 0.1251 - val_acc: 0.9671

Epoch 00030: val_loss did not improve from 0.11497
Epoch 31/100
 - 11s - loss: 0.0278 - acc: 0.9923 - val_loss: 0.1257 - val_acc: 0.9671

Epoch 00031: val_loss did not improve from 0.11497

Epoch 00031: ReduceLROnPlateau reducing learning rate to 9.99999883788405e-07.
Epoch 32/100
 - 11s - loss: 0.0219 - acc: 0.9933 - val_loss: 0.1248 - val_acc: 0.9671

Epoch 00032: val_loss did not improve from 0.11497
Epoch 33/100
 - 11s - loss: 0.0276 - acc: 0.9924 - val_loss: 0.1248 - val_acc: 0.9676

Epoch 00033: val_loss did not improve from 0.11497
Epoch 34/100
 - 11s - loss: 0.0233 - acc: 0.9930 - val_loss: 0.1258 - val_acc: 0.9673

Epoch 00034: val_loss did not improve from 0.11497

Epoch 00034: ReduceLROnPlateau reducing learning rate to 9.99999883788405e-08.
Epoch 35/100
 - 11s - loss: 0.0246 - acc: 0.9929 - val_loss: 0.1255 - val_acc: 0.9669

Epoch 00035: val_loss did not improve from 0.11497
Epoch 36/100
 - 11s - loss: 0.0269 - acc: 0.9919 - val_loss: 0.1258 - val_acc: 0.9673

Epoch 00036: val_loss did not improve from 0.11497

Epoch 00036: ReduceLROnPlateau reducing learning rate to 9.999998695775504e-09.
Epoch 37/100
 - 10s - loss: 0.0222 - acc: 0.9944 - val_loss: 0.1250 - val_acc: 0.9671

Epoch 00037: val_loss did not improve from 0.11497
Epoch 00037: early stopping

Evaluate MiniVGGNet Model

The evaluation is the same as already described above.

In [236]:
import keras
from keras.models import load_model

model_architecture = 'vggnet'

with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_test, y_test = test['features'], test['labels']

# convert class vector to binary class matrix.
y_test = keras.utils.to_categorical(y_test, num_classes)

# normalize data between 0.0 and 1.0
X_test = X_test.astype('float32') / 255

# load trained model
vggnet_model = load_model('./output/traffic_signs_model_{}_{}.h5'.format(model_architecture, optimizer_method))
print()
# print loss and accuracy of the trained model
loss, acc = vggnet_model.evaluate(X_test, y_test, batch_size=batch_size, verbose=2)
print('Loss:     {:.2f}%'.format(loss * 100))
print('Accuracy: {:.2f}%'.format(acc * 100))
print()

# show the true and the predicted classes for a couple of items of the test dataset
y_pred = vggnet_model.predict(X_test)

start = 110
count = 20
for i, (y_t, y_p) in enumerate(zip(y_test[start:start + count], y_pred[start:start + count])):
    print("{:4d} : True={: <2}  Predicted={: <2}  {}"
          .format(i + start, y_t.argmax(axis=-1), y_p.argmax(axis=-1),
                  y_t.argmax(axis=-1) == y_p.argmax(axis=-1)))    
Loss:     22.57%
Accuracy: 94.49%

 110 : True=1   Predicted=1   True
 111 : True=14  Predicted=14  True
 112 : True=16  Predicted=16  True
 113 : True=10  Predicted=10  True
 114 : True=30  Predicted=31  False
 115 : True=3   Predicted=3   True
 116 : True=27  Predicted=27  True
 117 : True=29  Predicted=29  True
 118 : True=1   Predicted=1   True
 119 : True=17  Predicted=17  True
 120 : True=13  Predicted=13  True
 121 : True=7   Predicted=7   True
 122 : True=1   Predicted=1   True
 123 : True=8   Predicted=8   True
 124 : True=2   Predicted=2   True
 125 : True=10  Predicted=10  True
 126 : True=10  Predicted=10  True
 127 : True=30  Predicted=20  False
 128 : True=1   Predicted=1   True
 129 : True=6   Predicted=6   True

We achieved an accuracy of 94.49%. By further tuning the hyperparameters and improving the data preprocessing, a further improvement can be achieved, but the target of 93% was reached

Confusion Matrix

In [237]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]  # Normalize
plt.figure(figsize=(20, 10))
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.colorbar()
Out[237]:
<matplotlib.colorbar.Colorbar at 0x254e748b8d0>

Step 3: Test a Model on New Images

To give yourself more insight into how your model is working, download at least five pictures of German traffic signs from the web and use your model to predict the traffic sign type.

Load and Output the Images

I have collected 20 images of German traffic signs from the Internet and stored in the directory test_images.

In [115]:
import glob
import cv2
    
# show test images
filenames = glob.glob('./test_images/*.jpg')
num_files = int(len(filenames))
cols = 5
rows = int(num_files / cols)
if num_files % cols > 0:
    rows += 1

fig, axs = plt.subplots(rows, cols, figsize=(20, 15))
axs = axs.ravel()

for i, filename in enumerate(filenames):
    image = cv2.imread(filename)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    axs[i].axis('off')
    axs[i].imshow(image)
    axs[i].set_title(filename)

plt.show()

Predict the Sign Type for Each Image

Here I predict the classes for the 20 test images and output them together with the names of the traffic signs.

In [14]:
import glob
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 
from keras.models import load_model

# read and preprocess test images
original_images = []
X_test = []
filenames = glob.glob('./test_images/*.jpg')
for filename in filenames:
    image = cv2.imread(filename)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    original_images.append(image)
    resized_image = cv2.resize(image, (32, 32), interpolation=cv2.INTER_AREA)
    X_test.append(resized_image)

X_test = np.array(X_test)

# normalize data between 0.0 and 1.0
X_test = X_test.astype('float32') / 255

# load trained vggnet model
model = load_model('./output/traffic_signs_model_{}_{}.h5'.format('vggnet', 'sdg'))

# predict
y_pred = model.predict(X_test)

# show test images with class predictions
num_files = int(len(filenames))
cols = 5
rows = int(num_files / cols)
if num_files % cols > 0:
    rows += 1

fig, axs = plt.subplots(rows, cols, figsize=(20, 15))
axs = axs.ravel()

for i, (filename, image, org_image) in enumerate(zip(filenames, X_test, original_images)):
    class_id = y_pred.argmax(axis=-1)[i]
    class_name = sign_names[class_id][1]
    axs[i].axis('off')
    axs[i].imshow(org_image)
    axs[i].set_title('{}: {}'.format(class_id, class_name))

plt.show()

Analyze Performance for Test Images

In [15]:
### Calculate the accuracy for these 5 new images. 
### For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate on these new images.

y_true = np.array([25, 17, 28, 1, 11, 22, 33, 14, 7, 18, 26, 1, 4, 11, 13, 12, 12, 13, 25, 36])
print("True:      " + str(y_true))
print("Predicted: " + str(y_pred.argmax(axis=-1)))
test_accuracy = sum(y_true == y_pred.argmax(axis=-1))/len(y_true)
print("Test Accuracy = {:.1f}%".format(test_accuracy*100))
True:      [25 17 28  1 11 22 33 14  7 18 26  1  4 11 13 12 12 13 25 36]
Predicted: [25 17 28  1 11 22 33 14  7 18 26  1  4 11 13 12 12 13 25 36]
Test Accuracy = 100.0%

Output Top 5 Softmax Probabilities For Each Image Found on the Web

For each of the new images, print out the model's softmax probabilities to show the certainty of the model's predictions (limit the output to the top 5 probabilities for each image).

In [16]:
### Print out the top five softmax probabilities for the predictions on the German traffic sign images found on the web. 

k = 5
n = len(filenames)
plt.figure(figsize=(15, 50))
plt.subplots_adjust(hspace=0.5)
for i, (filename, prob, org_image) in enumerate(zip(filenames, y_pred, original_images)):
    top_values_index = sorted(range(len(prob)), key=lambda p: prob[p])[-k:]
    class_id = prob.argmax(axis=-1)
    class_name = sign_names[class_id][1]
    plt.subplot(n, 2, 2 * i + 1)
    plt.imshow(original_images[i])
    plt.title(filename)
    plt.axis('off')
    plt.subplot(n, 2, 2 * i + 2)
    plt.barh(np.arange(1, 6, 1), prob[top_values_index])
    labels = np.array([sign_names[j] for j in top_values_index])
    plt.yticks(np.arange(1, 6, 1), labels[:, 1])
plt.show()

Step 4 (Optional): Visualize the Neural Network's State with Test Images

This Section is not required to complete but acts as an additional excersise for understaning the output of a neural network's weights. While neural networks can be a great learning device they are often referred to as a black box. We can understand what the weights of a neural network look like better by plotting their feature maps. After successfully training your neural network you can see what it's feature maps look like by plotting the output of the network's weight layers in response to a test stimuli image. From these plotted feature maps, it's possible to see what characteristics of an image the network finds interesting. For a sign, maybe the inner network feature maps react with high activation to the sign's boundary outline or to the contrast in the sign's painted symbol.

In [18]:
# https://github.com/philipperemy/keras-activations
import keras.backend as K
import numpy as np
import matplotlib.pyplot as plt

def get_activations(model, model_inputs):
    outputs = [layer.output for layer in model.layers]  
    funcs = [K.function([model.input] + [K.learning_phase()], [out]) for out in outputs]  # evaluation functions
    list_inputs = [[model_inputs], 0.]
    activations = [func(list_inputs)[0] for func in funcs]
    layer_names = [output.name for output in outputs]
    result = dict(zip(layer_names, activations))
    return result

def display_activations(activations):
    layer_names = list(activations.keys())
    activation_maps = list(activations.values())
    batch_size = activation_maps[0].shape[0]
    
    for i, activation_map in enumerate(activation_maps):
        print('Activation map {}'.format(i))
        shape = activation_map.shape
        
        if len(shape) == 4:
            activations = np.hstack(np.transpose(activation_map[0], (2, 0, 1)))
        elif len(shape) == 2:
            # try to make it square as much as possible. we can skip some activations.
            activations = activation_map[0]
            num_activations = len(activations)

            if num_activations > 1024:  # too hard to display it on the screen.
                square_param = int(np.floor(np.sqrt(num_activations)))
                activations = activations[0: square_param * square_param]
                activations = np.reshape(activations, (square_param, square_param))
            else:
                activations = np.expand_dims(activations, axis=0)
        else:
            raise Exception('len(shape) = 3 has not been implemented.')

        fig, ax = plt.subplots(figsize=(30, 30))
        plt.title(layer_names[i])
        ax.imshow(activations, interpolation='None', cmap='viridis')
        plt.show()
        
        
plt.imshow(X_test[0])
        
activations = get_activations(model, X_test[0])
display_activations(activations)
Activation map 0
Activation map 1
Activation map 2
Activation map 3
Activation map 4
Activation map 5
Activation map 6
Activation map 7
Activation map 8
Activation map 9
Activation map 10
Activation map 11
Activation map 12
Activation map 13
Activation map 14
Activation map 15
Activation map 16
Activation map 17
Activation map 18
Activation map 19
Activation map 20
Activation map 21
Activation map 22